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Abstract 

Background: Pancreatic ductal adenocarcinoma (PDAC) is known by its aggressiveness and lack of effective 
therapeutic options. Thus, improvement in current knowledge of molecular changes associated with pancreatic 
cancer is urgently needed to explore novel venues of diagnostics and treatment of this dismal disease. While there 
is mounting evidence that long noncoding RNAs (IncRNAs) transcribed from intronic and intergenic regions of the 
human genome may play different roles in the regulation of gene expression in normal and cancer cells, their 
expression pattern and biological relevance in pancreatic cancer is currently unknown. In the present work we 
investigated the relative abundance of a collection of IncRNAs in patients' pancreatic tissue samples aiming at 
identifying gene expression profiles correlated to pancreatic cancer and metastasis. 

Methods: Custom 3,355-element spotted cDNA microarray interrogating protein-coding genes and putative 
IncRNA were used to obtain expression profiles from 38 clinical samples of tumor and non-tumor pancreatic 
tissues. Bioinformatics analyses were performed to characterize structure and conservation of IncRNAs expressed in 
pancreatic tissues, as well as to identify expression signatures correlated to tissue histology. Strand-specific reverse 
transcription followed by PCR and qRT-PCR were employed to determine strandedness of IncRNAs and to validate 
microarray results, respectively. 

Results: We show that subsets of intronic/intergenic IncRNAs are expressed across tumor and non-tumor 
pancreatic tissue samples. Enrichment of promoter-associated chromatin marks and over-representation of 
conserved DNA elements and stable secondary structure predictions suggest that these transcripts are generated 
from independent transcriptional units and that at least a fraction is under evolutionary selection, and thus 
potentially functional. 

Statistically significant expression signatures comprising protein-coding mRNAs and IncRNAs that correlate to PDAC 
or to pancreatic cancer metastasis were identified. Interestingly, loci harboring intronic IncRNAs differentially 
expressed in PDAC metastases were enriched in genes associated to the MAPK pathway. Orientation-specific RT- 
PCR documented that intronic transcripts are expressed in sense, antisense or both orientations relative to protein- 
coding mRNAs. Differential expression of a subset of intronic IncRNAs [PPP3CB, MAP3K14 and DAPK1 loci) in 
metastatic samples was confirmed by Real-Time PCR. 

Conclusion: Our findings reveal sets of intronic IncRNAs expressed in pancreatic tissues whose abundance is 
correlated to PDAC or metastasis, thus pointing to the potential relevance of this class of transcripts in biological 
processes related to malignant transformation and metastasis in pancreatic cancer. 

Keywords: pancreatic cancer, molecular markers, noncoding RNAs, intronic transcription, metastasis, MAPK , path- 
way, cDNA microarrays 
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Background 

Pancreatic ductal adenocarcinoma (PDAC) is the most 
common pancreatic neoplasm and accounts for > 85% 
of pancreatic tumor cases [1]. PDAC is a devastating 
disease with very poor prognosis for which the only 
curative treatment is resection surgery [2]. However, 
only 15-20% of patients have resectable pancreatic 
tumor, and from these only 20% presents a 5-year survi- 
val, which results in an average 5 -year survival rate of 3- 
5% [1]. PDAC aggressiveness is mainly associated to the 
lack of early diagnosis tools and the limited response to 
available treatments [2]. 

Large-scale gene expression studies of tumor samples 
have been extensively employed to delineate the mole- 
cular pathways and cellular processes involved in tumor- 
igenesis and progression of PDAC [3] and to search for 
novel biomarkers for diagnosis and molecular targets for 
therapeutic intervention in pancreatic cancer [4]. In 
spite of the wealth of information generated in recent 
years on the most frequent molecular alterations found 
in PDAC [5], there are still important open question in 
pancreatic cancer biology such as the profound resis- 
tance of primary and metastatic PDAC to chemo- and 
radiotherapy [6]. Regarding the identification of molecu- 
lar markers for pancreatic cancer diagnostic/prognostic, 
while some promising candidate genes have been pro- 
posed [4], none have been proven effective to signifi- 
cantly improve early detection and to reduce mortality/ 
morbidity of the disease. Thus, a better understanding 
of the molecular basis of pancreatic cancer is required 
for the identification of more effective diagnostic mar- 
kers and therapeutic targets. 

Over the last decade, advances in genome-wide ana- 
lyses of the eukaryotic transcriptome have revealed 
that the majority of the human genome is transcribed, 
producing large numbers of long (> 200 nt) noncoding 
RNAs (IncRNAs) mapping to intronic and intergenic 
regions [7-10]. These include subsets of polyadenylated 
and non-adenylated transcripts that accumulate differ- 
ently in the nucleus and cytoplasm of cells [10,11]. 
While only a small fraction of IncRNAs have been 
characterized in detail, it is clear that these transcripts 
may act through diverse molecular mechanisms and 
play regulatory and structural roles in important biolo- 
gical processes, such as in genomic imprinting, chro- 
mosome inactivation, cell differentiation and 
development, cell proliferation, protein nuclear import, 
organization of nuclear domains and apoptosis (see 
[12] for a review). 

Altered expression of IncRNAs has been documented 
in different types of human cancer [13-15] prompting 
an increasing interest in their use as biomarkers for 
diagnosis and prognosis as well as potential therapeutic 



targets [14,16-19]. Increased expression of the IncRNA 
MALAT-1 has been observed in several types of tumors, 
including metastatic non-small cell lung cancer [19]. 
Recently, augmented levels of HOTAIR in primary 
breast tumors were shown to correlate with breast can- 
cer invasiveness and metastasis [18]. Measurement of 
IncRNA PCA3 in patient urine samples has been shown 
to allow more sensitive and specific diagnosis of prostate 
cancer than the widely used marker prostate-specific 
antigen (PSA) [16]. The IncRNA HULC is highly 
expressed in hepatocarcinoma patients and detected in 
the blood by conventional PCR methods [20]. 

There are several reports of aberrant expression of 
microRNAs in PDAC [21,22], and there is potential in 
their use as biomarkers for disease diagnosis [23,24]. 
However, there is a paucity of information regarding the 
expression of IncRNAs in pancreatic cancer. In an inter- 
esting study performed by Ting et al. it was observed 
the aberrant overexpression of satellite repeat RNAs 
(HSATII) ranging from 100 to 5000 nt in patients with 
PDAC [25]. Interestingly, detection of HSATII by RNA 
in situ hybridization was able to correctly diagnose 
PDAC in tumor biopsies, including cases in which the 
histopathology was non-diagnostic [25] . 

Our group has previously shown that most (at least 
74%) annotated protein-coding gene loci generate intra- 
genic IncRNAs that map to intronic regions [26]. Possi- 
ble relevance of intronic IncRNAs to neoplastic 
processes was proposed following the observation that 
subsets of these transcripts are present in gene expres- 
sion signatures correlated to the degree of malignancy 
in prostate cancer [17] or to tissue histology in head 
and neck tumors [27] and renal cell carcinoma [28]. In 
addition, a number of intronic IncRNAs were found to 
be regulated by androgen stimulation of cultured pros- 
tate cancer cells [29], indicating that these transcripts 
are expressed in a regulated manner and thus, corrobor- 
ating the idea that intronic IncRNAs are biologically 
relevant. 

In this study, we used a custom cDNA microarray 
platform with probes for IncRNAs expressed from intro- 
nic and intergenic regions of the human genome, as 
well as for a selected set of cancer-related protein-cod- 
ing genes to generate expression profiles from a collec- 
tion of tumor and non-tumor pancreatic tissue samples. 
Expression of intronic/intergenic IncRNAs subsets was 
detected across all samples tested. Enrichment of pro- 
moter-associated chromatin marks indicate that these 
transcripts originate from independent transcriptional 
units. Over-representation of conserved DNA elements 
and stable secondary structure predictions suggest that 
at least a fraction of these transcripts are under evolu- 
tionary selection and thus potentially functional. 
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Importantly, we identified expression signatures com- 
prising long noncoding RNAs that are significantly cor- 
related with primary and metastatic ductal pancreatic 
adenocarcinoma. This suggests that IncRNAs are modu- 
lated during tumorigenesis and tumor progression and 
therefore may participate in molecular processes rele- 
vant to malignant transformation and metastasis in pan- 
creatic cancer. 

Results 

Long noncoding RNAs from intronic and intergenic 
regions are expressed in neoplastic and non-tumor 
pancreatic tissues 

In this work, a custom spotted cDNA microarray with 
approximately 4,000 elements was used to investigate 
the expression patterns of a collection of protein-cod- 
ing transcripts and putative noncoding RNAs in clini- 
cal samples of primary and metastatic tumor, chronic 
pancreatitis and histologically normal pancreatic tissue. 
This array platform has been described previously 
[17,28] and contains probes that interrogate 2,371 
RefSeq mRNAs from genes associated with cancer in 
the literature, as well as 984 transcripts mapping to 
intronic or intergenic regions of the genome and to 
known IncRNAs. Fluorescent cRNA targets generated 
from 38 pancreatic tissue samples (15 primary adeno- 
carcinoma, 9 histologically normal adjacent tissue, 6 
metastatic samples and 8 chronic pancreatitis) were 
individually hybridized to microarrays in replicate. 
After data filtering (see Methods for details), 1,607 
transcripts were detected as expressed in at least one 
histological type, being 1,267 protein-coding mRNAs 
and 340 putative noncoding RNAs, including tran- 
scripts with no overlap to RefSeq exons, i.e, mapping 
to intronic and intergenic regions. Only candidate 
IncRNAs sequences that showed genomic alignments 
with at least 90% identity and coverage were further 
analyzed, resulting in 335 transcripts (22 known 
IncRNAs, 240 putative IncRNAs mapped to intronic 
regions and 73 to intergenic regions). 



Expression of comparable fractions of protein-coding 
mRNAs and putative long noncoding transcripts map- 
ping to intronic and intergenic regions was detected in 
all histological tissue types (Table 1). The fraction of 
intronic IncRNAs detected as expressed in the microar- 
ray (240/722 = 0.33) is comparable to that of known 
RefSeq IncRNAs (22/74 = 0.30) and intergenic IncRNAs 
(73/188 = 0.39), and lower than the fraction of 
expressed protein-coding mRNAs (1267/2371 = 0.53). 
The smaller fraction of IncRNAs detected in pancreatic 
tissues (0.30-0.39) compared to protein-coding mRNAs 
(0.53) reflects the observation from other studies that 
noncoding RNAs are generally less abundant and more 
tissue-specific than protein coding mRNAs [9,26]. In 
fact, we observed that the IncRNAs detected in pancrea- 
tic tissue samples by array hybridization were on average 
less abundant than protein-coding transcripts (average 
intensities 24.9 and 31.6, respectively). 

Of the 240 gene loci harboring intronic IncRNAs that 
were detected in pancreatic tissues, only 62 had array 
probes interrogating exons of mRNAs from the same 
loci. From these, 31 (50%) were detected only in intronic 
regions, pointing to a subset of IncRNAs that conceiva- 
bly are generated by independent intronic transcription 
rather than pre-mRNA splicing. Thirty one loci were 
detected by both exonic and intronic probes (50%). For 
each of these loci, Pearson correlation between the 
expression of the IncRNA and mRNA across all pan- 
creatic tissue samples was calculated. Correlations were 
generally low (-0.5 < r < 0.5 for 27 out 31 loci), with 11 
loci displaying a negative correlation between expression 
of the intronic IncRNA and the mRNA, and 20 showing 
a positive correlation. 

To obtain further information regarding the correla- 
tion between the 240 intronic IncRNAs expressed in 
pancreatic tissues and the adjacent exons from the same 
loci we analyzed their expression in a set of nine RNA- 
seq libraries [30]. For each locus in each library, the 
number of tags was normalized by RPKM (Reads Per 
Kilobase of exon model per Million mapped reads). 



Table 1 Gene expression detected in the microarrays according to probe type and pancreatic tissue histology 

Detected as expressed in 



Type 


# probes in the array 


NT 


T 


M 


CP 


# expressed probes * 






(n = 9) 


(n = 15) 


(n = 6) 


(n = 8) 




protein-coding mRNA 


2371 


1106 


1167 


1198 


1230 


1267 


Known IncRNA (RefSeq) 


74 


20 


19 


22 


18 


22 


Intronic IncRNA 


722 


206 


202 


238 


235 


240 


Intergenic IncRNA 


188 


68 


68 


74 


77 


78 


Total 


3355 


1400 


1456 


1532 


1560 


1607 



*To be considered as expressed, a probe signal should be detected above the median array intensity value in at least 75% of samples from at least one 
histological type. 

NT, non-tumor pancreatic tissue; T, primary pancreatic adenocarcinoma; M, metastases from primary pancreatic tumors; CP; chronic pancreatitis. 
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Pearson correlations between each intronic IncRNA and 
the adjacent upstream/downstream exons were calcu- 
lated if all elements (intronic IncRNA, upstream and 
downstream exons) were detected in at least 4 out 9 
RNAseq libraries. Seventy five loci (75/240, 31%) satis- 
fied these criteria and were further analyzed. As 
expected, we found a high correlation between the 
expression of exons flanking intronic IncRNAs (66/75 
with r > 0.5). We found that about a third of exon/ 
intron pairs showed positive correlation of expression 
(24/75, r > 0.5). The expression of more than half of 
exon/intron pairs were poorly correlated (46/75, -0.5 < r 
< 0.5), and a small fraction was negatively correlated (5/ 
75, r < -0.5). The overall low correlation observed 
between the expression of intronic IncRNAs and adja- 
cent exons suggest that for the most part intronic 
IncRNAs are processed and accumulate in the cell at 
rates distinct from mRNAs produced in the same loci, 
arguing against them being simply remainings of spli- 
cing lariats. 

To gain further insight into the expression pattern of 
the 335 putative IncRNAs, we investigated their expres- 
sion in other tissue types using publicly available RNA- 
seq datasets generated from nine different tissue 
histologies [30]. By cross-referencing the genome map- 
ping coordinates of the pancreatic-expressed IncRNAs 
with coordinates of the RNAseq reads we found that 
approximately 80% of the former (267/335) are detected 
in at least one other human tissue (Figure 1). 

Coding potential of the 335 sequences mapping to 
intronic and intergenic regions was investigated using 
the Coding Potential Calculator (CPC) software [31]. 
This analysis showed that most sequences (322/335, 
96%) have little or no protein coding potential. Thus, we 
suggest that most of the intronic and intergenic tran- 
scripts detected in pancreatic tissues are indeed noncod- 
ing RNAs. 

To document the length of the intronic transcripts 
expressed in pancreatic samples we compared the set of 
intronic RNA sequences (n = 240) with sequences 
resulting from the assembly of ESTs and mRNAs depos- 
ited in GenBank that map to intronic regions of the 
genome, previously generated in our group and which is 
available as a UCSC Genome Browser custom annota- 
tion track [26]. We found that 190 out 240 intronic 
transcripts (79%) are represented by an assembled 
sequence contig. The mean length of the intronic con- 
tigs is 779 bp, whereas the individual ESTs have a mean 
length of 428 bp, suggesting that the ESTs spotted on 
the microarray are partial sequences of longer noncod- 
ing RNA transcripts. 

We also investigated the proximity of 73 intergenic 
IncRNAs to UTRs of annotated genes to evaluate if 
these transcripts could represent untranslated regions of 



IncRNAs detected 
in pancreas 




51 (15%) 



Total 335(100%) 



RNA-Seq Burge libraries (Wang etal., 2008) 

# pancreatic IncRNAs 2 03 173 157 150 138 126 115 108 100 
detected in each (61 o /o) (52 o /o) (47 o /o) (45 o /o) (41 o /o) (38 o /o) (34 o /o) (32 o /o) (30 o /o ) 

library(%) 

Library size 1Q 6 1g g 16 2 U J 1/ 8 ^ Q ^ 6 1Q 6 14 2 

(in million) 

# RNAseq libraries that 
each pancreatic IncRNA 

was detected 0 1 2 

Figure 1 Intronic and intergenic IncRNAs detected 
pancreatic tissues comprise both tissue-specific and broadly 
expressed transcripts. Genomic coordinates of 335 intronic/ 
intergenic transcripts detected as expressed in pancreatic tissues 
were cross-referenced with genomic coordinates of transcripts 
detected in RNAseq libraries from nine different tissues [30]. 
LncRNAs were grouped and colored by the number of tissue 
libraries where their expression was detected (Y axis). RNAseq 
libraries are ordered in the x axis according to the number of 
pancreatic IncRNAs detected in each library. 



incomplete mRNAs. We found that 14 intergenic tran- 
scripts (14/73, i.e. 19%) map within 1 kb from a known 
3'or 5' UTR and could potentially extend the 3' or 5' 
untranslated region of a known protein-coding mRNA. 
The remaining 59 transcripts map at least 1 kb away 
from a known mRNA and possibly constitute yet unan- 
notated intergenic noncoding RNAs. 

To investigate if the intronic and intergenic RNAs 
detected in pancreatic tissue could be precursors of 
small regulatory ncRNAs, we compared the set of 335 
IncRNAs expressed in pancreas to microRNA and 
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snRNA sequence databases [32,33]. No significant simi- 
larity to known small RNA was found, except for one 
sequence that mapped to the SNOR89 locus. Consider- 
ing that the average length of micro RNA precursors (> 
1,000 nt) is greater than the average EST length (468 nt) 
we extended the genomic coordinates of probe 
sequences by 1 kb at both ends and repeated the 
sequence comparison with known small RNA datasets. 
Using this approach, we found four putative extended 
EST sequences that show high similarity to seven addi- 
tional small RNAs: hsa-rnir-1259, hsa-mir-326, hsa-mir- 
4269, hsa-mir-675, SNORD12, SNORD12B and 
SNORD12C. 

Sequence conservation among species is generally 
viewed as an indication of functional significance of a 
given genomic feature. We searched for evidence of 
sequence conservation within the set of intronic/inter- 
genic IncRNAs expressed in pancreatic tissues by com- 
paring their mapping coordinates with those from 
conserved DNA elements in vertebrates (phastCons 
46way vertebrates), mammals (phastCons 46 way placen- 
tal) and primates (phastCons 46way primates) obtained 
from the UCSC genome browser. After normalization 
by the number of conserved elements present in each 
group, relatively greater overlap with conserved DNA 
elements was observed within primate, mammalian and 
vertebrate sequences, in this order (Additional File 1, 
Figure SI). The overlap of intronic/intergenic RNAs 
with evolutionarily conserved DNA elements was greater 
than the expected by chance alone, as judge by the over- 
lap attained with a set of randomly selected intronic/ 
intergenic DNA sequences with same length and CG% 
content (Fishers exact test p < 0.05, Additional File 1, 
Figure SI). In addition, a fraction of the IncRNAs map- 
ping to intronic/intergenic regions (49 sequences out 
335 analyzed, 15%) appear to fold into stable RNA sec- 
ondary structures (P > 0.5) (http://verjol02.iq.usp.br/ 
sites/tahira/structures.html), as predicted by the RNAz 
program [34]. Altogether, these observations provide 
additional support to the notion that at least a fraction 
of the noncoding transcripts mapping to intronic/inter- 
genic regions may exert functional roles in pancreatic 
cells. 

Enrichment of promoter-associated chromatin marks 
(H3K4me3) and start sites of capped transcripts suggest 
that intronic IncRNAs are independent transcriptional 
units 

Given the paucity of information about the biogenesis of 
IncRNAs originated from intronic and intergenic 
regions, we searched for regulatory elements in the gen- 
ome that could be associated to their transcriptional 
control. First, we investigated the distribution of tri- 
methylation of lysine 4 in histone 3 (H3K4me3), a 



chromatin modification associated with regions of tran- 
scription initiation [35], in the vicinity of intronic/inter- 
genic IncRNAs. Genomic coordinates of H3K4me3 
marks measured in 13 cell lineages [35] were obtained 
from the UCSC Genome Browser. Only H3K4me3 
marks with a p < 10" 5 were used to limit the experimen- 
tal noise. The nearest H3K4me4 mark relative to the 
known boundaries of intronic/intergenic IncRNAs 
(based on sequenced ESTs) expressed in pancreatic tis- 
sues was selected and the distance annotated. As a con- 
trol, the same analysis was performed using 100 random 
sets of intronic or intergenic DNA sequences with same 
length and GC content. 

An enrichment of H3K4me3 marks was observed clo- 
ser to the known boundaries for the set of intronic tran- 
scripts expressed in pancreatic tissue samples (Figure 2, 
panel A, blue bars). The distance distribution of 
H3K4me3 marks relative to known boundaries of intro- 
nic transcripts was significantly different (KS test, high- 
est p < 0.05) from that observed for a random set of 
sequences (same length and %CG), indicating that it is 
not explained by chance alone. The same analysis was 
performed with the set of expressed intergenic regions. 
Although we observed a higher frequency of H3K4me3 
marks closer to the known boundaries of the intergenic 
transcripts, we found no statistically significant differ- 
ence in their distribution relative to the random control 
set (Figure 2, panel B, green bars). 

As expected, we observed a higher frequency of 
H3K4me3 marks closer to the known boundaries of pro- 
tein-coding mRNAs (Figure 2, panel A, red bars). This 
distribution is statistically different from the one 
obtained with a control comprising a random sequence 
set (KS test, highest p < 0.01). No statistically significant 
difference was observed between pancreas-expressed 
intronic/intergenic IncRNAs and mRNAs regarding the 
distributions of promoter-associated H3K4me3 marks, 
indicating that these distributions are similar. The 
enrichment of promoter-associated H3K4me3 at the 
vicinity of intronic/intergenic pancreatic-expressed tran- 
scripts argues that these transcripts are independent 
transcriptional units. 

We also investigated the distribution of annotated 
CpG islands relative to EST probes representing pro- 
tein-coding mRNAs and noncoding intronic/intergenic 
RNAs expressed in pancreatic tissues. To pursue this 
analysis we used the genomic coordinates of CpG 
islands available as a UCSC genome browser track. First, 
we cross-referenced the coordinates of annotated CpG 
islands with those of EST probes representing mRNAs 
expressed in pancreatic tissues, which showed an enrich- 
ment towards EST boundary coordinates (Figure 2, 
panel C, red bars), significantly different from the distri- 
bution observed by a random sequence set (KS test, p < 
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H3K4me3 - intronic IncRNA 



B 




1kb 2kb 3kb 4kb 5kb 6kb 7kb 8kb 9kb 10kb 
Distance (kb) 



CpG - intronic IncRNA 




1kb 2kb 3kb 4kb 5kb 6kb 7kb 8kb 9kb 10kb 
Distance (kb) 



CAGE TAGs (PolyA+) - Intronic IncRNA 




1kb 2kb 3kb 4kb 5kb 6kb 7kb 8kb 9kb 10kb 
Distance (kb) 



H3K4me3 - intergenic IncRNA 




1kb 2kb 3kb 4kb 5kb 6kb 7kb 8kb 9kb 10kb 
Distance (kb) 



CpG - intergenic IncRNA 



1kb 2kb 3kb 4kb 5kb 6kb 7kb 8kb 9kb 10kb 
Distance (kb) 



CAGE TAGs (PolyA+) - Intergenic IncRNA 




JXi-.n J-i-ri In- 



1kb 2kb 3kb 4kb 5kb 6kb 7kb 8kb 9kb 10kb 
Distance (kb) 



Figure 2 Genomic loci encoding intronic IncRNAs are enriched in promoter-associated histone marks and start sites of capped 
transcripts. Distance distribution (X axis) of promoter-associated chromatin marks H3K4me3 binding sites (panels A, B), CpG islands (panels C, D) 
and CAGE Tags (panels E, F) relative to genomic coordinates of intronic (blue bars) and intergenic (green bars) transcripts expressed in 
pancreatic tissues (Y axis) were calculated. For comparison, distribution distances were calculated for an equal number of protein-coding mRNAs 
(red bars) and for randomly selected intronic or intergenic genomic sequences with the same length and % GC of pancreas expressed IncRNAs 
(light gray bars). 



0.001). No statistically significant association with CpG 
islands was observed for intronic or intergenic sequence 
sets relative to random sets with same length and CG% 
content (KS test, p-value > 0.05) (Figure 2, panels C and 
D, blue and green bars). 

We also compared the known start sites of intronic/ 
intergenic IncRNAs with CAGE tags generated from 
poly(A+) RNA from 6 different cell lineages (RIKEN). 
We note that this set does not include CAGE libraries 
derived from pancreatic tissues. As pre-processing, coor- 
dinates of overlapped tags were clustered and only 



clusters containing at least 5 tags were considered for 
further analysis. Next, we calculated the distance of the 
closest CAGE tag cluster to intronic/intergenic 
IncRNAs, protein-coding mRNAs, and to random geno- 
mic sequences. A significant enrichment (KS test, p < 
0.05) of CAGE tags within lkb of the known start of 
intronic IncRNAs expressed in pancreatic tissues was 
observed (Figure 2, panel E, blue bars). Although a 
higher frequency of CAGE tags closer to the known 
start site of intergenic IncRNAs was observed, the 
enrichment was not statistically significant when 
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compared to the random control set (Figure 2, panel F, 
green and gray bars). 

Identification of a gene expression signature correlated 
to ductal pancreatic cancer comprising protein-coding 
and IncRNAs 

To gain further insights on the putative biological rele- 
vance of intronic/intergenic noncoding RNAs in pan- 
creatic cancer we investigated their relative expression 
in tumor and non-tumor pancreatic tissues. Identifica- 
tion of genes specifically deregulated in malignant pan- 
creatic epithelial cells is frequently confounded by an 
augmented stromal component in the latter due to the 
presence of proliferating stromal cells and infiltrating 
inflammatory cells [36,37]. A similar desmoplastic reac- 
tion is observed in chronic pancreatitis [38,39]. To favor 
the identification of genes specifically altered in neoplas- 
tic pancreatic cells, we performed a two-class analysis 



comparing the expression profiles of 15 primary adeno- 
carcinoma samples with nine histologically normal tissue 
fragments adjacent to tumors combined to eight sam- 
ples of chronic pancreatitis. Using this approach, we 
found 147 transcripts differentially expressed in pancrea- 
tic tumor samples relative to non-tumor tissues (FDR < 
10%). This expression signature comprised 104 protein- 
coding mRNAs and 43 IncRNAs, being 34 intronic and 
9 intergenic transcripts (See Additional file 2, Table SI 
for a complete list). As shown in Figure 3, except by 
one sample (210, primary tumor), the 147-gene signa- 
ture efficiently discriminated tumor and non-tumor tis- 
sues. Conceivably, the prevalence of an inflammatory 
component in the 210 sample could explain this sample 
showing an expression profile more similar to chronic 
pancreatitis samples. 

Next, we performed a meta-analysis to compare the list 
of protein-coding mRNAs presented in the pancreatic 




□ Non tumoral Tissue 

■ Primary Tumor 

[x] Chronic Pancreatitis 




■ 
i 
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Figure 3 A gene expression signature of pancreatic adenocarcinoma. A two-class statistical analysis (see Methods) identified 147 transcripts 
(rows) differentially expressed (FDR <10%) between primary adenocarcinoma samples and histologically normal and chronic pancreatitis samples 
combined (columns). Patient ID numbers are shown below the columns. Forty three transcripts mapping to intronic or intergenic regions were 
identified (43/147, i.e. 29%). Expression level of each gene is represented by the number of standard deviations above (red) or below (blue) the 
average value for that gene across all samples. Samples are ordered according to their individual correlation to the average profile of the 
primary tumor samples. Sample tissue histology is shown below each patient ID. 
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tumor expression signature with those identified in 
other gene expression studies with clinical samples of 
pancreatic cancer, retrieved from the Pancreatic 
Expression Database [40] (see Additional file 3, Table 
S2). Twenty four out 104 protein-coding transcripts 
detected in our analysis (24/104, i.e. 23%) were 
reported in at least one of the 12 studies, comprising 
15 different analyses, deposited in the Pancreatic 
Expression Database. From these, expression changes 
of 17 genes were confirmed by other studies, whereas 
3 showed partial agreements and 4 showed an inverted 
pattern of expression. Confirmed genes included genes 
already reported in the literature and proposed as bio- 
markers of pancreatic cancer such as S100A6, TIMP1, 
NF-kB, VCL and S100P [5,41-47]. It is worth mention- 
ing that overexpression of S100P was detected in 11 
different studies (Additional file 3, Table S2). We also 
detected upregulation of MBOAT in pancreatic tumors. 
Increased expression of MBOAT in ductal pancreatic 
adenocarcinoma has been already reported and was 
shown to inversely correlate to patient survival in pan- 
creatic cancer [39]. 

A gene enrichment analysis using the DAVID analysis 
suite [48] using as input either the list of protein-coding 
mRNAs or of intronic transcripts differentially expressed 
in pancreatic tumors was performed to investigate the 
over-representation of specific molecular functions, bio- 
logical processes and cellular components of the Gene 
Ontology annotation [49]. For this analysis, intronic 
transcripts were annotated according to the gene locus 
where they map on the genome. Only categories having 
corrected EASE score < 0.05 [48] were considered as 
overrepresented. 

Among the set of 104 protein-coding mRNAs differ- 
entially expressed in pancreatic tumors we found an 
enrichment of gene categories encoding proteins 
involved in "focal adhesion" (p < 0.03; TRIP6, TRIM25, 
VCL, SDC1, ARPC2, DLC1, CDH1, ITGB5), "RNA trans- 
port and localization " (p < 0.01; THOC7, THOC2, RAN, 
NUP8S, THOC3) and localizing to "basolateral plasma 
membrane" {p < 0.05; NOTCH4, ARPC1B, CDH1, 
TRIP6, VCL, TRIM25, SDC1, DLC1). Deregulation in 
pancreatic cancer of genes encoding proteins involved in 
focal adhesion has already been reported in the litera- 
ture [39]. Noteworthy, the gene category "RNA trans- 
port and localization" comprises genes associated to the 
TREX-complex (THOC2, THOC3). Increased expression 
of this complex (Thocl) in breast cancer correlates with 
tumor size and the metastatic state of the tumor pro- 
gression [50], thus suggesting that modulation of the 
TREX-complex could also have a role in pancreatic can- 
cer. No enriched gene category was found amongst gene 
loci that harbor differentially expressed intronic 
IncRNAs. 



Ingenuity Pathway Analysis (IP A) [51] was used to 
identify pathways and gene networks represented 
amongst the sets of protein-coding mRNAs identified in 
the pancreatic tumor gene expression signature. The 
most enriched network, "cellular movement, cell-to-cell 
signaling interactions and endocrine system" (p < 10 ~ 43 ) 
comprised 23 differentially expressed transcripts and 
included most genes represented in the enriched gene 
categories identified using DAVID (see Additional file 4, 
Figure S2). Gene networks associated with "cellular 
movement, skeletal and muscular system development 
and function and inflammatory response" (17 genes, p < 
10" 29 ) and "carbohydrate metabolism, small molecule 
biochemistry and infectious disease" (16 genes, p < 10" 
28 ) were also identified. 

Identification of genes correlated to metastasis in 
pancreatic cancer 

A hallmark of pancreatic cancer is the high prevalence 
of metastatic disease, whose molecular basis is poorly 
understood. To search for protein-coding and long non- 
coding RNAs with expression levels correlated to the 
metastatic phenotype in pancreatic cancer, we compared 
expression profiles from 15 primary adenocarcinoma 
samples with those obtained from 6 distant metastases 
originated from primary pancreatic adenocarcinoma. 
Metastatic samples were collected from secondary 
tumors appearing in different target sites (1 from perito- 
neum, 1 from ganglion, 4 from liver), from different 
patients. Using a significance threshold of FDR < 5%, we 
identified a metastasis-associated signature comprising 
355 differentially expressed transcripts (Figure 4). From 
these, 221 are protein-coding mRNAs and 134 are non- 
coding RNAs (134/355, 38% of signature), from which 
101 map to intronic, 27 to intergenic genomic regions 
and 6 are known IncRNAs (a complete list is available 
as Additional file 5, table S3). 

Gene enrichment analysis using protein-coding 
mRNAs differentially expressed in metastatic samples 
identified the over-representation of genes involved in 
"nucleic acid transport" and "RNA localization " (p < 
0.03; THOC7, THOC2, RAN, NUP85, THOC3, NUP88). 

A similar analysis performed with gene loci harboring 
intronic IncRNAs differentially expressed in metastasis 
showed enrichment of gene categories pertaining to 
"MAPK signaling pathway" (p < 0.03; ARRB1, ATF2, 
MAPK1, MAP2KS, MAP3K1, MAP3K14, PPP3CB, 
RAPGF2 and TGFfiR2), "phosphate metabolic process" 
(p < 0.05; ABL2, ENPP2, PTEN, CSNK1D, TYK2, 
MAPK1, MAP2K5, MAP3K1, MAP3K14, PPP3CB, 
PPP2R2A, PASK, TNK2, DAPK1 and TGFpR2), "non- 
membrane-bounded organelle" (p < 0.02; ABL2, GPHN, 
ITPR1, SORBS 1, TYK2, MAPK1, MAP2K5, MAP3K1, 
NDRG1, DST, MCPH1, USH1C, MAEA, BBSS, SLC4A7, 
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Figure 4 A gene expression signature correlated to metastasis in pancreatic adenocarcinoma. Three hundred fifty five transcripts (rows) 
identified as differentially expressed (FDR < 5%) between metastatic (dark box) and primary tumor (dark gray) samples from 21 patients 
(columns). Patient ID numbers are shown below the columns. One hundred thirty four intronic and intergenic IncRNAs were identified, 
comprising 38% of the metastasis signature. Expression level of each transcript is represented by the number of standard deviations above (red) 
or below (blue) the average value across all samples. Samples are ordered according to their individual correlation to the average profile of 
primary tumor samples. Tissue histology is shown below each patient ID. 



RAPH1, CNN 3, NR2C2, DMD, DAZAP1, PHF12, 
NOP58, ATF3, ALMS1, ST0N2, DAPK1 and MY05A) 
and "actin filament-based process"^ < 0.05; ABL2, 
SORBS1, PACSIN2, CNN3, MY05A and DST). 

Ingenuity Pathway Analysis identified significantly 
enriched gene networks amongst protein-coding genes 
differentially expressed in metastasis. The most enriched 
gene network of differentially expressed protein-coding 
mRNAs (p < 10" 41 ) included genes related to "cellular 
movement, gene expression and immune cell trafficking" 
(see Additional file 6, Figure S3). Among these we found 
that up regulation of S100A4, NCAM1 and LIMK1 had 



already been associated with metastatic behavior in pan- 
creatic cancer [52-54]. While the remaining genes in the 
network had not been associated with metastasis in pan- 
creatic cancer yet, most of them have previously shown 
to be involved with malignancy or metastatic behavior 
in other types of cancer (see Additional file 7, Table S4 
for a complete list). Other gene networks enriched in 
protein-coding mRNAs deregulated in metastatic tumor 
samples were "cell cycle, genetic disorder, metabolic dis- 
ease" (21 genes, p < 10~ 33 ), "cardiovascular system devel- 
opment and functions, embryonic development and 
tissue development" (21 genes, p < 10" 30 ) and "cancer, 
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tumor morphology and genetic disorder" (19 genes, p < 
10~ 29 ). IPA analysis also highlighted the prevalence of 
genes related to cell death within the metastasis-signa- 
ture. It comprised 42 protein-coding mRNAs related to 
apoptosis (19 down-regulated and 23 up-regulated), in 
line with the notion that perturbation of the normal 
programmed cell death is involved in the metastatic 
phenotype in pancreatic cancer [6,55]. Interestingly, we 
found 6 intronic IncRNAs mapped to locus of apoptosis- 
related genes among those present in the metastasis sig- 
nature (A TF2, TGFpR2, MAP2K5, MAP3K1, DAPK1 
and PTEN). 

Metastasis-associated intronic IncRNAs are expressed with 
antisense and/or sense orientation relative to 
corresponding protein-coding genes 

To document in more detail the structure of lnc RNAs 
mapping intronic regions of gene loci related to the 
MAPK pathway or to apoptosis, which were over-repre- 
sented in the metastasis-signature, we investigated their 
orientation relative to the corresponding protein-coding 
mRNA. Orientation-specific RT-PCR was employed to 
determine the strandedness of the eleven intronic tran- 
scripts mapping to introns of MAPK pathway or/and 
apoptosis-related genes, namely ARRB1, ATF2, MAPK1, 
MAP2K5, MAP3K1, MAP3K14, PPP3CB, RAPGF2, 
TGF/3R2, DAPK1 and PTEN. These experiments were 
performed using total RNA isolated from pancreatic 
tumor tissue samples or from cultured MIA PaCa-2 
cells. Ten transcripts showed evidence of being 



transcribed with the same (sense) orientation of the cor- 
responding protein-coding mRNA in both pancreatic 
tissue samples and MIA PaCa-2 cells (Figure 5). Inter- 
estingly, a transcript with antisense orientation relative 
to the protein-coding mRNA was detected in an intro- 
nic region of PPP3CB in MIA PaCa-2 cells. When RNA 
from pancreatic tumors was used, antisense intronic 
transcription was detected in three additional loci 
{ATF2, TGFBR2 and MAP3K1), which produced both 
sense and antisense messages (Figure 5). 

The relative abundance of the eleven intronic 
IncRNAs identified in genes from the MAPK pathway or 
related to apoptosis was evaluated by quantitative Real- 
Time PCR in RNA samples isolated from primary 
tumors and distant metastasis. We initially measured 
the abundance of each of the 11 intronic transcripts in 
three samples of primary adenocarcinoma and three 
samples of metastasis. In spite of a great variability due 
to small sample size, 7 out 11 intronic transcripts 
showed a similar expression change (same direction) as 
measured in the microarray. Since the amount of RNA 
from clinical samples were limiting, we selected for 
further validation in additional samples three intronic 
IncRNAs, being one antisense (PPP3CB) and two with 
the same orientation (MAP3K14 and DAPK1) relative to 
the protein-coding gene. As shown in Figure 6, statisti- 
cally significant increased expression of all three intronic 
IncRNAs was observed. 

We next asked if the expression changes of these 
intronic transcripts would reflect in the expression of 
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Figure 5 LncRNAs deregulated in pancreatic cancer metastasis map to intronic regions of genes associated with the MAPK pathway 
and/or related to apoptosis. Transcriptional orientation of eleven intronic IncRNAs mapping to MAPK pathway and/or apoptosis-related gene 
loci {ARRB1, ATF2, MAPK1, MAP2K5, MAP3K1, MAP3K14, PPP3CB, RAPGF2, TGFf}R2, DAPK1 and PTEN) was investigated by strand-oriented 
reverse transcription followed by PCR. For each gene, sense (S) or antisense (AS) transcription was measured. Controls (C) for the absence of self- 
annealing during reverse transcription (RT) were obtained by performing RT reactions in the absence of primers (RT+, Primer -). Controls for the 
absence of genomic DNA contamination were obtained by omitting reverse transcriptase in the RT reaction (RT-, Primer +). 
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Figure 6 Expression changes of intronic IncRNAs mapping to PPP3CB, MAP3K14 and DAPK1 loci are not accompanied by changes in 
the corresponding protein-coding mRNAs. Relative levels of intronic IncRNAs and protein-coding mRNAs from PPP3CB (A), MAP3K14 (B) and 

DAPK1 (C) loci were determined by Real-Time PCR in clinical samples of primary adenocarcinoma (circles) or distant metastasis (squares) from 
pancreatic cancer patients. For each transcript, the number of tested samples is indicated in the x axis. For each gene, results are expressed as 
fold-change relative to the average expression of primary tumor samples. For each sample, qPCR assays were performed in triplicate and mean 
values are shown. House-keeping gene HMBS was used as the endogenous control for normalization across patient samples. All intronic 
transcripts (left panels) were differentially expressed in metastatic samples at a significance threshold of p< 0.05 {PPP3CB, p = 0.0259; MAP3K14, 
p = 0.0069; DAPK1, p = 0.0035). Protein-coding transcripts (right panels) were not significantly differentially expressed. 



the corresponding protein-coding mRNA. Quantitative 
RT-PCR experiments with primers interrogating pro- 
tein-coding mRNAs from PPP3CB, MAP3K14 and 
DAPK1 loci showed no statistically significant expression 



change between primary tumors and metastasis samples. 
To investigate the co-expression of protein-coding 
mRNAs and intronic IncRNAs from the same loci we 
measured the Pearson correlation of their expression 
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measurements in all samples tested. A high Pearson cor- 
relation (r = 0.78, p < 0.013) was observed for the 
MAP3K14 locus, suggesting that the intronic sense tran- 
script may be a by-product of pre-mRNA processing of 
the protein-coding transcript (see discussion for details). 
No significant correlation between protein-coding 
mRNAs and intronic IncRNAs was observed for the 
other two loci (p > 0.05), leaving open the possibility 
that intronic RNAs mapping to PPP3CB (antisense) and 
DAPK1 (sense) loci are noncoding RNAs originated 
from independent transcriptional events. 

Discussion 

In this work we investigated gene expression profiles 
from clinical samples of pancreatic cancer using a cus- 
tom cDNA microarray enriched in probes that interro- 
gate long potentially noncoding RNAs mapping to 
intronic and intergenic regions of the human genome, 
plus a collection of protein-coding genes previously 
associated with cancer in the literature. By comparing 
expression profiles of 38 pancreatic clinical samples with 
four distinct tissue histologies (primary adenocarcinoma, 
adjacent non- tumor tissue, chronic pancreatitis, metasta- 
sis), we detected in all types of pancreatic tissues studied 
a proportion of intronic and intergenic transcripts com- 
parable to the one observed for protein-coding mRNAs. 
There are several reports of aberrant expression of 
microRNAs [21-24], but this is to our knowledge the 
first time that the expression of IncRNAs has been stu- 
died in pancreatic cancer. 

We observed that most intronic and intergenic tran- 
scripts expressed in pancreatic tissues have little or no 
coding potential (96% of total). Comparison with 
sequence contigs resulting from the assembly of EST/ 
mRNA data produced in our group [26] showed that 
these transcripts have a mean size of at least 779 nt, 
being longer than the EST probes deposited in the 
microarrays, which represent indeed only parts of longer 
noncoding RNAs transcribed from intronic regions. 
Most putative intergenic transcripts (-81%) were located 
more than 1 kb apart from an UTR of an annotated 
gene, suggesting that for the most part, these are indeed 
intergenic transcripts rather than uncharacterized 
untranslated regions of incomplete mRNAs. 

While it is clear that IncRNAs may exert diverse cellu- 
lar functions through multiple molecular mechanisms 
[12,56,57], it has been suggested that a fraction of the 
transcriptome noncoding complement may correspond 
to transcriptional noise resulting from RNA polymerase 
activity in regions of open chromatin or intronic seg- 
ments of processed mRNAs [58]. Our expression mea- 
surements of intronic IncRNAs do not permit to 
distinguish between i) intron lariats resulting from spli- 
cing of a pre-mRNA or ii) independent transcriptional 



units located within intron-annotated genomic regions. 
We have focused on poly(A+)-selected RNA fractions 
followed by oligo-dT primed reverse transcription to 
minimize the chance of labeling targets from non-polya- 
denylated spliced lariats. We argue that the identifica- 
tion of subsets of transcripts that map to intronic 
regions and whose steady-state levels allows the detec- 
tion by microarrays indicate that these are not rapidly 
turned-over intron lariats. We have also performed a 
series of analysis to obtain additional evidence to sup- 
port the notion that intronic/intergenic IncRNAs 
detected in pancreatic tissues are indeed bona fide cellu- 
lar transcripts, as discussed below. 

We first sought independent confirmation of intronic/ 
intergenic IncRNA expression using RNAseq data gener- 
ated from 9 distinct tissue libraries [30]. We found that 
approximately 80% of intronic/intergenic IncRNAs 
detected in pancreatic tissues were also detected in at 
least one RNAseq library (Figure 1). Most transcripts 
confirmed by the RNAseq data were detected i) only in 
a single tissue type other than pancreas, or ii) in all 9 
tissue libraries plus pancreas, indicating the prevalence 
of subsets of noncoding transcripts with broad or speci- 
fic tissue-type expression patterns, respectively (Figure 
1). 

While only a fraction of the intronic/intergenic 
IncRNAs expressed in pancreatic tissues overlapped evo- 
lutionarily conserved DNA elements in vertebrates, 
mammals and primates, we observed a significant 
enrichment (p < 0.05) compared to randomly selected 
control regions. This result suggests that at least a frac- 
tion of these IncRNAs are under purifying selection in 
the vertebrate lineage and therefore must be biologically 
functional. For the remaining transcripts, absence of 
sequence conservation should not be taken as evidence 
of no biological relevance, since it is known that well- 
characterized functional IncRNAs are poorly conserved 
across their global sequence [59]. 

As proposed by Washielt et al. [60], mapping con- 
served RNA secondary structure may lead to the discov- 
ery of novel functional IncRNAs. We found that a small 
fraction of IncRNAs expressed in pancreas (15%. i.e. 49/ 
335) are predicted to form stable structural domains 
that could be important for their processing or biologi- 
cal function. It is well documented in the literature that 
small regulatory RNAs can be generated by processing 
of long RNA precursors transcribed from intronic and 
intergenic regions of the genome [56]. To ask what frac- 
tion of our set of IncRNAs expressed in pancreatic tis- 
sues could be precursor of small RNAs we compared 
their sequences to those of known microRNA and 
snoRNA [32,33]. Only a discrete overlap was found, 
indicating that long intronic/intergenic transcripts are 
predominantly not precursors of known microRNAs/ 
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snoRNAs, yet leaving open the possibility that these 
transcripts may represent precursors of uncharacterized 
novel small RNAs. 

We found significant enrichment of H3K4me3, a pro- 
moter-associated chromatin mark frequently found in 
RNA Pol II transcribed regions [35,61], in the vicinity 
(up to 2 kb) of intronic (p < 0.05) noncoding transcripts 
as compared to randomly selected genomic DNA 
sequences. A comparable H3K4me3 enrichment was 
observed nearby known protein-coding transcripts, sug- 
gesting that transcription of protein-coding mRNAs and 
intronic IncRNAs initiates at promoter regions with 
similar chromatin contexts. We also observed a signifi- 
cant enrichment of CAGE tags proximal to known start 
sites of intronic IncRNAs expressed in pancreatic tis- 
sues, corroborating the notion that at least a fraction of 
these is independent transcriptional units. Since pan- 
creatic tissues were absent from the study that gener- 
ated the CAGE tags used for cross-reference, these 
results possibly underestimate the co-localization of 
intronic/intergenic IncRNAs with bona fide transcription 
start sites of capped transcripts. 

Differently from protein-coding mRNAs, we did not 
find significant enrichment of CpG island in the vicinity 
of intronic and intergenic RNA sequences expressed in 
pancreatic tissues. Based on this observation, we pro- 
pose that methylation of CpG islands is not involved in 
the transcriptional regulation of most intronic/intergenic 
IncRNAs expressed in pancreatic tissues. Nonetheless, 
the full set of observations regarding the structure, con- 
servation and genomic context argues that at least a 
fraction of intronic/intergenic transcripts detected in 
pancreatic tissues are independent transcriptional units 
rather than transcriptional noise originated from ran- 
dom Pol II firing [62], prompting us to investigate in 
more detail their relative expression levels in tumor and 
non-tumor pancreatic tissues. 

Differential expression of intronic IncRNAs in prostate 
and renal cancer has already been documented [17,28]. 
Here we extend these observations to pancreatic cancer, 
asking whether there were sets of intronic/intergenic 
IncRNAs deregulated in clinical samples of pancreatic 
tumor. Comparing expression profiles from primary 
tumors with samples from histologically non-malignant 
pancreatic tissue and chronic pancreatitis (CP) we iden- 
tified a 147-gene signature correlated with primary pan- 
creatic tumor. This strategy was devised to favor the 
identification of tumor specific markers rather than 
transcripts associated with the stromal cell component, 
which is augmented in both tumor and CP samples 
[36,37]. We sought to validate the pancreatic cancer 
expression signature by performing a meta-analysis with 
published gene expression studies of pancreatic cancer. 
Only 23% of the protein-coding mRNAs present in our 



pancreatic cancer signature were also identified in other 
reports. This modest overlap can be accounted for by 
differences in platforms and the heterogeneity of pan- 
creatic tumor samples. Notwithstanding, we observed a 
high agreement (17/24, 71%) between the expression 
changes measured in our signature and those retrieved 
from published data, which provides independent sup- 
port for our result and validates our sample set and 
methodological approach. This set included genes 
already reported in the literature as differentially 
expressed in pancreatic cancer and that have been inves- 
tigated as biomarkers for pancreatic cancer (i.e. S100A6 
[47], SIOOP [46], TIMP1 [63] and NF-nB [64]). In agree- 
ment with previous findings [5], the analysis of gene 
enriched categories in the pancreatic cancer expression 
signature indicated the over-representation of genes 
involved in focal adhesion. Over-representation of focal 
adhesion genes in the pancreatic cancer signature is sug- 
gestive that deregulation of genes encoding proteins 
involved in the connection and signaling to the extracel- 
lular matrix plays an important role in the malignant 
transformation and/or maintenance of pancreatic adeno- 
carcinomas. This set included integrin beta 5 (ITGB5), 
which we found to be upregulated in pancreatic adeno- 
carcinoma. Itgb5 protein has been investigated as diag- 
nostic biomarker in non-small cell lung cancer [65] and 
is target of the inhibitor drug EMD121974, which is 
under clinical trial [66]. Thus, ITGB5 is an attractive 
candidate to be tested as biomarker and/or new drug 
target in pancreatic cancer. 

Interestingly, a significant fraction (29%) of the 147- 
gene signature correlated with primary pancreatic tumor 
was comprised by IncRNAs mapping to intronic or 
intergenic regions, suggesting that noncoding RNAs 
could exert roles related to tumorigenesis of pancreatic 
cancer. This result prompted us to investigate the exis- 
tence of subsets of IncRNAs with expression levels 
altered in metastatic samples. 

We identified a statistically significant metastasis sig- 
nature of 355 differentially expressed transcripts that 
includes 220 protein-coding genes, 134 intronic/inter- 
genic transcripts and 6 known IncRNAs (Figure 4 and 
Additional file 5, Table S3). In addition to protein-cod- 
ing genes previously shown to be deregulated in pan- 
creatic metastasis (7 out of 19), the metastasis signature 
comprises known genes already associated to metastasis 
in other types of cancer (Additional file 7, Table S4), 
thus pointing to potentially interesting candidates for 
testing as new targets for treatment of the metastatic 
disease in pancreatic cancer. 

The significant fraction of IncRNAs in the metastasis 
signature (38% of total) suggests that deregulation of 
these IncRNAs could also be associated with the meta- 
static process. Expression changes of protein-coding 
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mRNAs from genes of the MAPK pathway has already 
been described in pancreatic carcinoma [67-69]. Here 
we found 9 intronic IncRNAs mapped to genes corre- 
lated to the MAPK pathway in the metastasis signature. 
We also identified expression changes in gene loci 
related to apoptosis, including 42 protein-coding 
mRNAs and 6 intronic IncRNAs; this pathway was one 
out of 12 described by Jones et al. [6] as genetically 
altered in pancreatic cancer. Four intronic IncRNAs 
belong to both categories. These results prompted us to 
document in more detail the nature of the 11 transcripts 
mapping to intronic regions of gene loci associated with 
the MAPK pathway or related to apoptosis, i.e., their 
relative orientation to the corresponding protein-coding 
mRNAs. 

Strand-specific RT-PCR assays using RNA aliquots 
from tumor tissue samples showed that 4 intronic tran- 
scripts have antisense orientation relative to the protein- 
coding mRNA: PPP3CB, ATF2, TGFBR2 and MAPK1. 
Antisense transcripts originated in PPP3CB intronic 
regions were also detected in MIA PaCa-2 cells. The 
antisense orientation relative to the corresponding pro- 
tein-coding mRNA provide strong evidence to support 
that these noncoding RNAs are produced from indepen- 
dent transcriptional units, possibly under control of a 
different promoter region. 

Transcripts mapping to intronic regions with the same 
orientation of the corresponding protein-coding mRNA 
were detected in the ATF2, TGFBR2 and MAPK1, as 
well as in the 7 other gene loci tested (ARRB1, 
MAP3K1 , MAP3K14, MAP2K5, PTEN, DAPK1 and 
RAPGF2), in both tissue and MIA PaCa-2 RNA samples. 
These sense-oriented intronic transcripts could indeed 
be bona fide RNAs originated from independent tran- 
scription, but also result from reverse transcription of 
unprocessed mRNA precursors or of stable RNA lariats 
generate during pre-mRNA splicing. Further experi- 
ments will be necessary to determine the precise nature 
of these sense-oriented intronic RNAs. 

The relative abundance of two sense (DAPK1, 
MAP3K14) and one antisense-oriented (PPP3CB) intro- 
nic transcripts in samples of primary pancreatic adeno- 
carcinoma and pancreatic metastases was independently 
accessed by qRT-PCR, confirming the results measured 
in the microarray hybridizations. Four additional intro- 
nic IncRNAs showed concordant results between qRT- 
PCR and the microarrays (ARRB, RAPGF2, ATF2 and 
PTEN). Expression changes of 4 intronic IncRNAs were 
not concordant between qRT-PCR and microarray 
(MAP3K1 , TGFBR2, MAP2K5 and MAPK1). The 
amount of RNA and the number of patient tissue sam- 
ples available for the qRT-PCR experiments were limit- 
ing, and the marginally significant and non-validated 
IncRNA candidates were tested only in few samples in 



an initial round of validation. It is possible that some of 
the intronic IncRNA candidates that failed the initial 
round of validation would still be validated as differen- 
tially expressed if tested in additional tissue samples. 
However, an alternative explanation for the non-valida- 
tion of some candidates is the presence of array hybridi- 
zation artifacts such as cross-hybridization or target 
amplification biases. 

Intragenic IncRNAs have been shown to modulate in 
cis the expression of mRNAs expressed in the same 
locus [29,70,71]. We measured the relative abundance of 
mRNAs produced in the PPP3CB, DAPK1 and 
MAP3K14 loci in the same samples and did not observe 
statistically significant expression differences between 
primary tumors and metastasis. This result indicates 
that intronic RNAs produced in these loci do not affect 
in cis the abundance of the corresponding protein-cod- 
ing transcripts. This conclusion is also supported by the 
absence of significant correlation between expression 
levels of protein-coding and noncoding RNAs originat- 
ing from PPP3CB and DAPK1 loci. The possibility that 
intronic IncRNAs differentially expressed in metastatic 
samples may exert regulatory functions acting in trans 
is compelling and warrants further studies. 

It has been shown that a significant portion of the 
noncoding component of the human transcriptome is 
comprised of non-polyadenylated RNAs [10]. We note 
that our analysis was limited to the set of IncRNAs 
interrogated by the array platform (Table 1) and by the 
use of poly(A+)-enriched RNA, and therefore is not 
comprehensive in terms of describing the full comple- 
ment of IncRNAs expressed in pancreatic tissues. Thus, 
additional studies using unbiased approaches such as 
RNAseq or tiling arrays will be required to catalog all 
poly(A+) and poly(A-) transcripts expressed in pancrea- 
tic tissues with distinct degrees of malignancy and for 
the identification of novel regulatory IncRNA candidates 
involved in the malignant transformation and tumor 
progression. 

Conclusions 

In this work we report that noncoding RNAs originating 
from intronic and intergenic genomic regions are 
expressed in tumor and non-tumor pancreatic tissues. 
Enrichment of promoter-associated chromatin marks 
plus the observation of antisense orientation of intronic 
transcripts relative to mRNAs expressed from the same 
loci provide evidence that these messages are not by- 
products of random transcription or pre-mRNA splicing 
but rather, are independent transcriptional units. 
Further investigation will be required to determine the 
biogenesis of these IncRNAs. 

We identified gene expression signatures correlated to 
primary and metastatic stages of pancreatic cancer, 
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which in addition to protein-coding mRNAs comprise 
collections of long intronic and intergenic noncoding 
RNAs. Further studies will be necessary to reveal possi- 
ble biological functions and molecular mechanisms 
exerted by these IncRNAs in tumorigenesis and/or pro- 
gression of pancreatic tumors. 

In summary, our work contributes with novel candi- 
date biomarkers of pancreatic cancer and highlights the 
importance of investigating the biological relevance of 
long noncoding RNAs in order to fully understand the 
molecular basis of the disease. 

Methods 

Patient Samples and Cell Lines 

A total of 38 pancreatic samples stored in freshly-frozen 
tissue collections were obtained with informed consent 
from patients seen at Hospital das Clmicas, Faculdade 
de Medicina da Universidade de Sao Paulo (HC- 
FMUSP). Primary tumor tissues (T) were obtained from 
15 patients with no evidence of metastasis. Nine samples 
of histologically normal pancreatic tissue fragments 
(NT) were dissected from non-neoplasic tissue sections 
adjacent to tumor sites. Six samples of metastases origi- 
nated from primary pancreatic tumors (M) were 
obtained from biopsies in affected organs (one from 
peritoneum, one from ganglion and four from liver tis- 
sues). Eight tissue samples from patients with chronic 
pancreatitis (CP) were also collected. All tissue sections 
were reviewed by a pathologist for histological confirma- 
tion and whenever necessary, macro-dissected to guar- 
antee that 80% or more of the sections used for gene 
expression analysis were composed of neoplastic/pan- 
creatitis tissue. 

Pancreatic carcinoma cell lines MIA PaCa-2 were 
obtained from the American Type Culture Collection 
and maintained using DEMEM supplemented with 10% 
(v/v) fetal calf serum (FCS), 3 mM L-glutamine, 100 \ig/ 
ml streptomycin and 100 U/ml penicillin. 

RNA extraction and microarray target preparation 

Total RNA was extracted from pancreatic tissue samples 
(50-100 mg tissue) using Trizol (Invitrogen) according 
to manufacturer's recommendations. RNA cleanup 
including a DNase I digestion step was performed using 
RNeasy spin columns (Qiagen). RNA integrity was mea- 
sured by the relative abundance of 28S/18S ribosomal 
subunits, verified through micro fluid capillary electro- 
phoresis (Agilent Bioanalyzer 2100). 

To generate cRNA targets, 1 [ig of total RNA from 
each sample was linearly amplified in two rounds of 
reverse transcription followed by in vitro transcription 
according to Wang et al. [72]. Briefly, oligo dT-T7 was 
used to prime first-strand cDNA synthesis (Superscript 
III First Strand Synthesis - Invitrogen). After second- 



strand cDNA synthesis (cDNA Polymerase Mix - Clon- 
tech), cRNA targets were produced by in vitro transcrip- 
tion (MegaScript T7 - Ambion). In the second round of 
amplification, cRNAs produced in the first-round were 
reverse transcribed using random hexamer primers and 
used as template for in vitro transcription in the pre- 
sence of amino-allyl UTP. Prior to hybridization, cRNA 
targets were labeled by coupling with mono-reactive 
Cy5-esters (Amersham). Quantification of cDNA yield 
and dye incorporation was performed using a NanoDrop 
spectrophotometer (Thermo Scientific). Typically, 50- 
100 (ig of cRNA were obtained following two rounds of 
linear amplification. Two sets of cRNA targets were gen- 
erated from each RNA sample and independently hybri- 
dized to microarray slides. 

Microarray design and hybridization 

Construction of the spotted custom-cDNA microarray 
platform was described previously [17]. Probes were 
selected from the over 1 million EST clone collection 
generated during the Human Cancer Genome Project, a 
large-scale EST sequencing project that used cDNA 
libraries generated from poly(A) mRNA derived from 
over 20 different types of human tumors [73,74]. Tran- 
scripts from the EST dataset were annotated as protein- 
coding, putative intronic IncRNA or intergenic IncRNA 
following mapping to the human genome sequence and 
cross-referencing with genome mapping coordinates of 
annotated genes (RefSeq dataset) [17]. Intronic/inter- 
genic IncRNAs used for microarray spotting were ran- 
domly selected from the annotated EST dataset. 
Transcripts annotated as "intronic IncRNAs" comprise 
sequences that mapped within an intronic region of a 
protein-coding gene. Transcripts annotated as "inter- 
genic IncRNAs" comprise sequences that map to geno- 
mic regions devoid of any annotated gene. To be 
annotated as intronic or intergenic a given transcript 
could not overlap a genomic region spanning an exon of 
annotated protein-coding genes. "Known IncRNAs" refer 
to transcripts whose genomic coordinates overlap fully 
with the coordinates of noncoding RNAs from the 
RefSeq dataset (accession Id = NR_nnnnnn). Transcripts 
annotated as "protein-coding gene" overlapped with 
exons of protein-coding RefSeq transcripts in genomic 
space. To account for possible unannotated intron 
retention events, partial transcripts mapping to exon/ 
intron boundaries were annotated as "exonic". 

In the course of this work microarray probes were re- 
mapped to the latest version of the human genome 
(hgl9) and re-annotated to reflect updated RefSeq and 
UCSC gene models (Oct. 2010). Each glass-slide con- 
tains 3,355 cDNA fragments spotted in duplicate, plus 
positive (cDNA from housekeeping genes) and negative 
(plant and bacterial DNA) controls. Spotted cDNAs 
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comprise 722 ESTs mapping to intronic regions of well- 
annotated (RefSeq) protein-coding genes, 74 ESTs map- 
ping to known RefSeq IncRNAs, 188 ESTs mapping to 
intergenic regions of the genome. The array also con- 
tained 2,371 ESTs mapping to exons of protein-coding 
genes associated with cancer based on a literature search 
[17], comprising genes involved in apoptosis, tumorigen- 
esis, metastasis, cancer metabolism and cancer 
progression. 

For each sample, cRNA targets were ressuspended in a 
final volume of 200 \il of lx Microarray Hybridization 
Solution v.2.0 (GE Healthcare) containing 25% forma- 
mide, denaturated at 92°C for 2 minutes and incubated 
with microarrays at 42°C for 16 hours using an auto- 
mated slide processor (GE Healthcare). Following 
sequential washes in lx SSC; 0.2% SDS, O.lx SSC; 0.2% 
SDS and O.lx SSC; 0.2% SDS, microarray slides were 
scanned immediately in a Generation III Microarray 
Systems Scanner (Molecular Dynamics/GE Healthcare). 
For each sample, two slides were hybridized with differ- 
ent preparations of cRNA targets. As probes are spotted 
in duplicate in the arrays, a total of 4 replicate measure- 
ments were collected for each cDNA for each sample. 

Data processing and analysis 

Cy5-intensity measurements from hybridized targets 
were extracted from array images using the ArrayVision 
software (Imaging Research Inc.). To make the expres- 
sion values comparable across all samples tested, the 
raw data was normalized by the quantile method [75]. 
Next, for each slide, the fifty percent of probes with the 
lowest intensity values were filtered out. A mean expres- 
sion value for each probe, in each patient sample, was 
calculated when at least 3 out of 4 replicates showed 
valid measurements. Only probes with valid measure- 
ments in at least 75% of the samples in any of the histo- 
logical groups (NT, T, PA or M) were selected, resulting 
in a total of 1,607 probes for further analysis. The Com- 
Bat program was used to remove systematic variations 
in gene expression across experiments resulting from 
the use of different batches of microarrays [76]. Inter- 
slide Pearson correlations using normalized intensities 
from all probes in the array were calculated before and 
after filtering and normalization of data intensities. Raw 
data intensities showed average inter-slide correlations 
of 0.63, whereas normalized data showed inter-slide cor- 
relation of 0.83. 

Raw and normalized microarray intensities were 
deposited in the Gene Expression Omnibus database 
(GEO - http://www.ncbi.nlm.nih.gov/geo/) under acces- 
sion number GSE30134. 

Significance analysis of microarrays (SAM) approach 
[77] was employed to identify gene expression signatures 
correlated to tissue histology, with the following 



parameters: two or multi-class response, 1000 permuta- 
tions, K-Nearest Neighbors Imputer, and false discovery 
rate (FDR) < 10% or <5%. For representation of gene 
expression measurements in heat-maps, samples were 
ordered according to their individual correlation to the 
average profile of the primary tumor samples. 

Bioinformatics analyses 

We used the BEDTools software package [78] to cross- 
reference genome mapping coordinates (GRCh37 build, 
hgl9) of our dataset of intronic/intergenic noncoding 
sequences with those from the various datasets used in 
this analysis and available through the UCSC Genome 
Browser [66]: i) RNAseq data of PolyA + RNA-derived 
libraries from 9 tissues [30], ii) RIKEN CAGE tag data 
from PolyA + RNA-derived libraries from 6 cell lineages 
[79]; iii) ChlP-seq data of H3K4m3 DNA binding sites 
[35], iv) conserved DNA elements in vertebrates, mam- 
malian and primates calculated with PhastCons program 
[66], v) predicted CpG islands [80] and vi) intronic non- 
coding RNAs assembled from EST/mRNA GenBank 
data [26]. 

To test the statistical significance of the overlap 
between our dataset of intronic/intergenic IncRNAs and 
the datasets of conserved elements and regulatory motifs 
(H3K4me3, CpG islands, CAGE tags), we generated 100 
groups of randomly selected sequences from intronic or 
intergenic regions of the human genome matching in 
number, length and CG content our set of expressed 
noncoding sequences. As pre-processing of CAGE tag 
data, coordinates of overlapped tags were clustered and 
only clusters containing at least 5 tags were considered 
for further analysis. Fischers exact test was used to test 
the statistical significance (p < 0.05 threshold) of the 
enrichment of conserved DNA elements in intronic/ 
intergenic IncRNAs relative to the enrichment observed 
for the 100 random sequence sets. 

For the analysis of transcription regulatory elements, 
we first computed the distance of the closest H3K4me3 
marks, predicted CpG islands and CAGE tags to our set 
of expressed intronic/intergenic IncRNAs, expressed 
protein-coding mRNAs, and of the set of 100 random 
groups. Transcription regulatory elements mapping to 
5'UTRs of known transcripts (RefSeq and UCSC genes) 
were removed to avoid the contribution of signals at 
start sites of known genes to the enrichment of regula- 
tory elements at start sites of intronic IncRNAs mapping 
nearby. 

Only regulatory elements distant up to 10 kb of 
sequence boundaries were considered. Next, non-para- 
metric Kolmogorov-Smirnov (KS) test statistics, imple- 
mented using the Deducer package under R language 
[81] was used to compare the distance distributions of 
H3K4me3 marks, predicted CpG islands and CAGE tags 
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observed for intronic or intergenic noncoding sequences 
and those calculated for each of the 100 control random 
sets. Distance distributions of regulatory motifs from 
intronic/intergenic IncRNAs/protein-coding mRNAs 
were considered significantly different from the obtained 
by chance if all KS p-values calculated using each ran- 
dom set were smaller than 0.05. 

Protein-coding potential of intronic and intergenic 
IncRNAs was evaluated using the Coding Potential Cal- 
culator software [31] with default parameters. RNAz 
program [34] was used to predict structurally conserved 
and thermodynamically stable RNA secondary struc- 
tures. Only predicted structures with P > 0.5 were con- 
sidered as containing conserved secondary structures. 

Orientation-specific RT-PCR 

Aliquots of DNAse-treated total RNA from a pool of six 
pancreatic tumor tissues samples or from Mia PaCa-2 
cells were used as template in orientation-specific 
reverse transcription reactions. Reactions were per- 
formed with 200 ng total RNA plus 2.5 \iM of oligonu- 
cleotide primers designed to detect sense or antisense 
strand intronic transcripts, relative to the orientation of 
the mRNA from the same locus. Superscript III™ Super 
Mix (Invitrogen) was used according manufacture's 
recommendations, with the following modification: 
reverse transcription reaction was increased to 57°C to 
limit RNA self-annealing. To verify the absence of prim- 
ing due to self-annealing or genomic DNA contamina- 
tion, control reactions were performed without addition 
of primers or of reverse transcriptase, respectively. 

Real-time RT-PCR 

One microgram aliquots of DNase-treated total RNA from 
11 clinical samples of primary pancreatic tumors (5 
already used in the microarray experiments and 6 new 
samples) and 6 of distant metastases with pancreatic origin 
(4 already used in the microarray and 2 new samples) were 
reverse transcribed using Superscript III™ Super Mix kit 
(Invitrogen) and random hexamer primers according to 
manufacturer's recommendation. Relative abundance of 
selected transcripts primary tumor/metastasis samples was 
determined by real-time PCR using the ABI PRISM® 7300 
Real Time PCR System and the SYBR Green PCR Master 
Mix kit (Applied Biosystems). Reactions were performed 
in a final volume of 20 [i\ containing a 5 \i\ aliquot of 
diluted cDNA (1:7) and 1 \iM of forward and reverse 
gene-specific primers. Expression levels of hydroxymethyl- 
bilane synthase (HMBS) [82] appeared to be constant and 
was used as a reference gene to make expression measure- 
ments comparable across all different samples tested. For 
quantitative results, the level of each transcript was nor- 
malized by the level of HBMS, and represented as fold 
change using the 2" AACt method [83], where AACt = (Ct 



candidate gene in sample x - Ct reference gene in sample 
X)sam P ie - mean AC t of all primary tumor samples tested. 

Additional material 



Additional File 1: Figure SI: Intronic/intergenic IncRNA are enriched 
in conserved DNA elements. Genomic coordinates of conserved DNA 
elements identified in Vertebrate, Placental or Primate sequences (see 
Methods for details) were cross-referenced to those from IncRNAs 
expressed in pancreatic tissue (blue bars). These presented a higher 
overlap with evolutionary conserved DNA elements than the observed 
for a random set of genomic DNA sequences with same length and CG 
content (gray bars) (Fisher's test p <0.05). Bar heights indicate the 
number of IncRNAs that overlap conserved DNA elements in each 
taxonomic group divided by the total number of conserved DNA 
elements present in each group. 

Additional File 2: Table SI: List of transcripts differentially 
expressed in PDAC relative to chronic pancreatitis and non tumor 
tissue samples combined. 

Additional File 3: Table S2: Validation of protein-coding genes 
differentially expressed in PDAC by meta-analysis of published 
data. 

Additional File 4: Figure S2: Genes modulated in pancreatic cancer 
are involved in cellular movement, cell-to-cell signaling interactions 
and endocrine system. Ingenuity Pathway Analysis was used to identify 
gene networks over-represented among the set of 104 protein-coding 
genes differentially expressed in PDAC samples. The most significantly 
enriched network (p = 1(T 43 ) is comprised by 23 differentially expressed 
genes measured in the microarrays. Red indicates higher expression, and 
green, lower expression in tumor tissues relative to chronic pancreatitis 
and adjacent non-tumor tissue samples combined. 

Additional File 5: Table S3: List of genes differentially expressed 
between PDAC and metastatic tissue samples. 

Additional File 6: Figure S3: Genes modulated in metastatic 
pancreatic cancer are involved in cellular movement, gene 
expression and immune cell trafficking. Ingenuity Pathway Analysis 
was used to identify gene networks over-represented among the set of 
221 protein-coding genes differentially expressed in PDAC relative to 
metastasis tissue samples. The most significantly enriched network (p= 
1CT 41 ) is comprised by 24 differentially expressed genes measured in the 
microarrays. Red indicates higher expression, and green, lower expression 
in metastasis relative to PDAC tissue samples. 

Additional File 7: Table S4: Genes from the pancreatic cancer 
metastasis signature related to tumor aggressiveness in other 
cancer types. 
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